Fast Generation of Accurate Synthetic Microdata

نویسندگان

  • Josep Maria Mateo-Sanz
  • Antoni Martínez-Ballesté
  • Josep Domingo-Ferrer
چکیده

Generation of a synthetic microdata set that reproduces the statistical properties of an original microdata set is a promising approach to statistical disclosure control (SDC) of microdata. In this paper, a new method for generating continuous synthetic microdata is proposed. The covariance matrix and the univariate statistics of the original data set are exactly preserved. The method is non-iterative and its complexity grows linearly with the number of records to be protected.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Post-Masking Optimization of the Tradeoff between Information Loss and Disclosure Risk in Masked Microdata Sets

Previous work by these authors has been directed to measuring the performance of microdata masking methods in terms of information loss and disclosure risk. Based on the proposed metrics, we show here how to improve the performance of any particular masking method. In particular, post-masking optimization is discussed for preserving as much as possible the moments of first and second order (and...

متن کامل

Information Loss in Continuous Hybrid Microdata: Subdomain-Level Probabilistic Measures

The goal of privacy protection in statistical databases is to balance the social right to know and the individual right to privacy. When microdata (i.e. data on individual respondents) are released, they should stay analytically useful but should be protected so that it cannot be decided whether a published record matches a specific individual. However, there is some uncertainty in the assessme...

متن کامل

Development of Synthetic Microdata for Educational Use in Japan

Japan’s new Statistics Act has come fully into effect in April 2009. The new law allows access to Anonymized microdata, and at the same time it requires users to go through an application process and imposes some restrictions. The National Statistics Center (NSTAC) has developed a type of microdata which can be accessed without an application process and used without restrictions. These data do...

متن کامل

Microdata Protection

Governmental, public, and private organizations are more and more frequently required to make data available for external release in a selective and secure fashion. Most data are today released in the form of microdata, reporting information on individual respondents. The protection of microdata against improper disclosure is therefore an issue that has become increasingly important and will co...

متن کامل

Synthetic Data Generation using Benerator Tool

Datasets of different characteristics are needed by the research community for experimental purposes. However, real data may be difficult to obtain due to privacy concerns. Moreover, real data may not meet specific characteristics which are needed to verify new approaches under certain conditions. Given these limitations, the use of synthetic data is a viable alternative to complement the real ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004